fix(extraction): bump EXTRACTION_MAX_TOKENS 4096 → 8192#4
Draft
moralespanitz wants to merge 1 commit intomainfrom
Draft
fix(extraction): bump EXTRACTION_MAX_TOKENS 4096 → 8192#4moralespanitz wants to merge 1 commit intomainfrom
moralespanitz wants to merge 1 commit intomainfrom
Conversation
Extraction LLM was truncating JSON output at ~14 KB during BEAM Sprint 2 CR mini-slice runs on dense 10-turn chunks. Server log showed: [extractFacts] JSON parse failed (Unterminated string in JSON at position 14152 ...); attempting repair across 6 chunks of one ingest, causing iter 7 (first attempt) to crash on conv-3. The Anthropic max_tokens budget defaults to 4096 in extraction.ts. Going to 8192 doubles the headroom for JSON output without changing any other behavior. Cost impact is marginal (Anthropic bills only for tokens actually generated; rare for extraction to use the full 8192). Validation: server is running with this change locally; iter 7 v3 N=3 full-ingest reruns succeed without truncation. Companion harness mitigation lowered chunk size from 10 to 5 turn-pairs (in atomicmemory-benchmarks PR #8) to reduce the chance of hitting the limit at all. This server-side bump is defense-in-depth.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extraction LLM was truncating JSON output at ~14 KB during BEAM Sprint 2 CR mini-slice runs on dense 10-turn chunks. Bumping the max_tokens budget from 4096 → 8192 prevents the truncation.
Evidence
Server log during iter 7 (first attempt) before this fix:
Six truncations on one ingest pass. Conv-3 crashed.
After bumping to 8192: zero truncation across iter 7 v3 N=3 full-ingest reruns.
Risk
Marginal cost increase: Anthropic bills only for output tokens actually generated, and only the dense chunks that previously truncated will use more tokens. Most extractions stay well under 4096.
Companion changes (separate PRs)
Test plan
npx tsc --noEmitcleannpm testfallow --no-cache